System and Method for Efficiently Generating Association Rules Using Scaled Lift Threshold Values to Subsume Association Rules

ABSTRACT

A data processing system processes data sets (such as low-resolution transaction data) into high-resolution data sets by mapping generic information into attribute-based specific information that may be processed to identify frequent sets therein. When association rules are generated from such frequent sets, the complexity and/or quantity of such rules may be managed by removing redundancies from the rules, such as by removing rules providing only trivial associations, removing rules having only a part group as the consequent, modifying rules to remove redundant antecedent items and/or filtering subsumed rules from the generated rule set that do not provide sufficient lift to meet an adjustable specialization lift threshold requirement.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates in general to the field of databaseanalysis. In one aspect, the present invention relates to a system andmethod for data mining operations for identifying association rulescontained in database records.

Description of the Related Art

The ability of modem computers to assemble, record and analyze enormousamounts of data has created a field of database analysis referred to asdata mining. Data mining is used to discover association relationshipsin a database by identifying frequently occurring patterns in thedatabase. These association relationships or rules may be applied toextract useful information from large databases in a variety of fields,including selective marketing, market analysis and managementapplications (such as target marketing, customer relation management,market basket analysis, cross selling, market segmentation), riskanalysis and management applications (such as forecasting, customerretention, improved underwriting, quality control, competitiveanalysis), fraud detection and management applications and otherapplications (such as text mining (news group, email, documents), streamdata mining, web mining, DNA data analysis, etc.). Association ruleshave been applied to model and emulate consumer purchasing activities bydescribing how often items are purchased together. Typically, a ruleconsists of two conditions (e.g., antecedent and consequent) and isdenoted as A===;:,,C where A is the antecedent and C is the consequent.For example, an association rule, “laptop speaker (80%),” states thatfour out of five customers that bought a laptop computer also boughtspeakers.

The first step in generating association rules is to review a databaseof transactions to identify meaningful patterns (referred to as frequentpatterns, frequent sets or frequent item sets) in a transactiondatabase, such as significant purchase patterns that appear as commonpatterns recurring among a plurality of customers. Typically, this isdone by using constraint thresholds such as support and confidenceparameters, or other guides to the data mining process. These guides areused to discover frequent patterns, i.e., all sets of item sets thathave transaction support above a pre-determined minimum support S andconfidence C threshold. Various techniques have been proposed to assistwith identifying frequent patterns in transaction databases, includingusing “Apriori” algorithms to generate and test candidate sets, such asdescribed by R. Agrawal et al., “Mining Association Rules Between Setsof Items in Large Databases,” Proceedings of ACM SIGMOD Int'l Conf. onManagement of Data, pp. 207-216 (1993). However, candidate setgeneration is costly in terms of computational resources consumed,especially when there are prolific patterns or long patterns in thedatabase and when multiple passes through potentially large candidatesets are required. Other techniques (such as described by J. Han et al.,“Mining Frequent Patterns Without Candidate Generation,” Proceedings ofACM SIGMOD Intl Conf. on Management of Data, pp. 1-12 (2000)) attempt toovercome these limitations by using a frequent pattern tree (FPTree)data structure to mine frequent patterns without candidate setgeneration (a process referred to as FPGrowth). With the FPGrowthapproach, frequency pattern information is stored in a compact memorystructure.

Once the frequent sets are identified, the association rules aregenerated by constructing the power set (set of all subsets) of theidentified frequent sets, and then generating rules from each of theelements of the power set. For each rule, its meaningfulness (i.e.,support, confidence, lift, etc.) is calculated and examined to see if itmeets the required thresholds. For example, if a frequent pattern {A, B,C} is extracted—meaning that this set occurs more frequently than theminimum support S threshold in the set of transactions—then severalrules can be generated from this set:

-   -   {A}{B, C}    -   {B}=;:- {A, C}    -   {C} {A, B}    -   {A, B} {C}    -   etc. where a rule A B which indicates that “Product A is often        purchased together with Product B,” meaning that there is an        association between the sales of Products A and B. Such rules        can be useful for decisions concerning product pricing, product        placement, promotions, store layout and many other decisions.

Conventional data mining approaches use generic item descriptions, suchas the SKU (stockable unit number) when identifying items or products ina transaction database. When these generic descriptions are used toidentify frequent sets, the frequent sets are not large andpower-set/rule generation is tractable. However, conventional datamining techniques using item data at the SKU (stockable unit number)level do not provide sufficient information to develop meaningfulassociation rules for complex products. For example, if there are threetransactions involving the purchase of a computer identified as“Desktop-SKU” with one of the transactions also involving the purchaseof DVD disks, the product level of description used to identify thecomputer does not reveal that two of the computers did not include DVDdrives, while the third computer (which was purchased with the DVDdisks) did include a DVD drive. As this example demonstrates, this lackof granularity in the item description diminishes the quality ofassociation rules that can be generated, resulting in limited patterncorrelation.

During the generation of association rules from frequent sets (forexample, with algorithms such as FPGrowth), the number of generatedrules (and processing time required to generate the rules) can becomeintractable as the number of frequent sets increases, often resulting inredundant rules being generated. An example of rule redundancy is rulesubsumption, when a first rule R1 subsumes a second rule R2 whenever theconsequents of R1 are a superset of the consequents of R2 (anythingconcluded by R2 is also concluded by RI), and the antecedents of R1 aresatisfied in any context in which the antecedents of R2 are satisfied(antecedents of RI are more general that the antecedents of R2). Forexample, with rules R1 and R2 (where RI: A C, D, and R2: A, B C, D), R1subsumes R2. Other examples of rule redundancy include rules thatprovide trivial associations and rules with redundant antecedents.Conventional approaches for removing redundancy have not been effective.For example, when RI subsumes R2, conventional association rulegeneration approaches (such as FPGrowth) would discard R2 if and only ifthe confidence of RI is greater than or equal to the confidence of R2.For the most part, this confidence condition is rarely if ever met, asmore general rules tend to have lower confidence. An article by Bayardoet al., entitled “Constraint-Based Rule Mining in Large, DenseDatabases,” Proc. of the 15th Int'l Conf. on Data Engineering (1999),discusses a simple technique for applying rule subsumption when thesubsumed rule has higher confidence, but this higher confidence does notmeet an absolute minimum improvement threshold and is inflexiblyapplied.

As seen from the conventional approaches, a need exists for methodsand/or apparatuses for improving the extraction of frequent patterns foruse in data mining. There is also a need for finer granularity in thegeneration of frequent sets to better discover meaningful patternswithout imposing the cost of a combinatorial explosion of the data thatmust be examined. In addition, there is a need for methods and/orapparatuses for efficiently generating association rules withoutrequiring unwieldy candidate set generation, without requiring multipledatabase passes and without requiring additional time to generateassociation rules as the frequent set grows. Moreover, there is a needfor an improved method and system for removing redundant associationrules that allow beneficial general rules to be retained without undulyincreasing the size of the generated rule set. Further limitations anddisadvantages of conventional systems will become apparent to one ofskill in the art after reviewing the remainder of the presentapplication with reference to the drawings and detailed descriptionwhich follow.

SUMMARY OF THE INVENTION

In accordance with one or more embodiments of the present invention, asystem and method are provided for generating more meaningful frequentset data by providing finer granularity in the item descriptions used togenerate frequent sets. In a selected embodiment, improved patterncorrelation is provided by representing items in terms of their featuresso that a part or product may be represented in terms of its part groupand/or various attribute-value pairs. This approach provides sufficientdetail so that association rule mining can be used for complex products.However, where attribute-based association rule mining produces a largenumber of rules, this number can be reduced in a systematic manner andstill retain the characteristics of the original rule set, therebyimproving performance of the rule set at runtime by reducing the numberof rules that are evaluated. For example, any additional complexityresulting from the increase in the number of generated association rulesmay be addressed by modifying association rules to remove redundantantecedent part group items. Complexity may also be reduced bydiscarding redundant rules, such as rules providing only trivialassociations. In addition, complexity may be reduced by removing rulesthat are subsumed by other rules, including specifically subsumed rulesthat have a higher confidence that the subsuming rule, provided that theconfidence of the subsumed rule does not meet or exceed a specializationlift threshold, such as an adjustable lift threshold. Specializationlift acts as an increment above the confidence of the subsuming rule todetermine when subsumed rules should be removed (if the confidence ofthe subsumed rule is below the threshold) or retained (if the confidenceof the subsumed rule is at or above the threshold). In other words, ageneral rule should subsume a more specific rule if the more specificrule does not provide sufficient “lift,” where lift is a measure ofincrease in confidence. For example, suppose R1 subsumes R2, and R1 has30% confidence and R2 has 35% confidence. If the specialization liftthreshold calculated for this rule is greater than five, this wouldresult in R2 being removed from the generated rule set. The loss of themarginally increased confidence of R2 is deemed negligible compared tothe expense of managing the additional rule. When considering a specificvalue for a specialization lift, a value that decreases with increasingconfidence of the subsuming rule provides a means for further managingthe specialization lift heuristic. When the confidence of both thesubsuming rule and subsumed rule are low, a larger value forspecialization lift allows more low confidence rules to be filtered. Asthe confidence of the subsuming rule increases, fewer rules should befiltered. Thus, the improved rule generation process filters thegenerated rule set to identify subsumed rules using an adjustablethreshold so that general rules are retained and more specific rulesthat provide little in terms of improved confidence are discarded.

The objects, advantages and other novel features of the presentinvention will be apparent from the following detailed description whenread in conjunction with the appended claims and attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an exemplary data processing system for generatinghigh-resolution product information that may be used for mining detailedfrequent pattern information.

FIG. 2 illustrates selected exemplary flow methodologies for removingredundant rules from a set of generated association rules using anadjustable rule subsumption technique.

FIG. 3 depicts an exemplary system for mining attribute-basedassociation rules from a transaction database.

FIG. 4 shows a flowchart schematically illustrating the process offinding frequent patterns using a frequent pattern tree and efficientlygenerating attribute-based association rules from the frequent patterntree.

DETAILED DESCRIPTION

An efficient database mining method and apparatus is described forgenerating attribute-based frequent patterns from transaction databases,efficiently deriving association rules from the detailed frequentpatterns, and removing redundancies from the derived rules. Whilevarious details are set forth in the following description, it will beappreciated that the present invention may be practiced without thesespecific details. For example, selected aspects are shown in blockdiagram form, rather than in detail, in order to avoid obscuring thepresent invention. Some portions of the detailed descriptions providedherein are presented in terms of algorithms or operations on data withina computer memory. Such descriptions and representations are used bythose skilled in the data processing arts to describe and convey thesubstance of their work to others skilled in the art. In general, analgorithm refers to a self-consistent sequence of steps leading to adesired result, where a “step” refers to a manipulation of physicalquantities which may, though need not necessarily, take the form ofelectrical or magnetic signals capable of being stored, transferred,combined, compared, and otherwise manipulated. It is common usage torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like. These and similar terms may be associatedwith the appropriate physical quantities and are merely convenientlabels applied to these quantities. Unless specifically stated otherwiseas apparent from the following discussion, it is appreciated thatthroughout the description, discussions using terms such as processing,computing, calculating, determining, displaying or the like, refer tothe action and processes of a computer system, or similar electroniccomputing device, that manipulates and/or transforms data represented asphysical, electronic and/or magnetic quantities within the computersystem's registers and memories into other data similarly represented asphysical quantities within the computer system memories or registers orother such information storage, transmission or display devices.

Referring now to FIG. 1 , a block diagram illustrates an exemplary dataprocessing system 10 in which the present invention may be practiced bymapping or otherwise transforming a first data set of transactioninformation into a second data set of transaction information thatprovides more detailed information specifying the attributes of thepurchased product. In a selected application, the second data set oftransaction information may be used to generate cross-sell and up-sellrecommendations based on frequent patterns mined from an order historystore in a transaction database. By providing greater granularity to thetransaction data, pattern correlation is improved by representing itemsin terms of their features so that a part or product may be representedin terms of its part group and various attribute-value pairs. Forexample, if there are three transactions involving the purchase of acomputer with the third transaction also involving the purchase of DVDdisks, by including an identification for each computer item of whetherit includes a DVD drive (e.g., Computer.DVDrive.No for the first andsecond computers and Computer.DVDrive.Yes for the third computer),sufficient detail is provided to enable more accurate correlationbetween the computer and disk purchases when generating associationrules.

In the example depicted in FIG. 1 , the data processing system 10 (e.g.,a private wide area network (WAN) or the Internet) includes a centralserver computer system 11 and one or more networked client or servercomputer systems 13 that are connected to the network. Communicationbetween central server computer system 11 and the networked computersystems 13 typically occurs over a network, such as a public switchedtelephone network over asynchronous digital subscriber line (ADSL)telephone lines or high-bandwidth trunks, for example, communicationschannels providing T1 or OC3 service. Networked client computer systems(e.g., 13) typically access central server computer system 11 through aservice provider, such as an interne service provider (“ISP”) byexecuting application specific software, commonly referred to as abrowser, on the networked client computer systems 13.

In a selected embodiment, a first data set of transaction information isstored in a database 14 that may be accessed directly or indirectly bythe server 11. In this example, the first data set identifies the itemsincluded in a plurality of transactions by including a generic productdescriptor 16, 18 for each transaction item, such as the SKU (stackableunit number) for a purchased product. Thus, a hard drive that waspurchased is identified with the hard drive SKU (stackable unit number)16 and a desktop computer is identified with the desktop SKU 18. Inaccordance with an embodiment of the present invention described herein,the first data set of transaction information may be mapped or otherwisetransformed into a second data set of transaction information thatprovides more detailed information identifying with greater specificitythe attributes of the purchased product. In a selected embodiment, thedata transformation is implemented with a computer or other dataprocessing functionality (e.g., server 11) which loads a copy of thefirst data set 16, 18 from a database 14 into local memory 15, asindicated with arrow 20. Using a product detail knowledge database (suchas contained in product data memory 2) that specifies various productfeature details for each transaction item, the server 11 maps ortransforms the generic product descriptors of the first data set into asecond data set that specifies additional details and/or features forthe item of interest, such as more detailed product descriptorinformation. In the depicted embodiment, part numbers in an order (e.g.,16, 18) may be mapped to a PartGroup identifier and to a set ofattribute names and values (e.g., 23, 27, respectively) and stored inthe database 14, as indicated with arrow 21.

With reference to the example depicted in FIG. 1 , an 80Gb, 7200 RPM,SCSI drive identified with the HD-SKU 16 could be mapped to thefollowing attribute-based transaction items:

-   -   _Hard Drive    -   Hard Drive_Size. 80Gb    -   _Hard Drive_RPM. 7200    -   _Hard Drive_Interface.SCSI

These items are included in a second data set 22 as an entry 23-26 whichquantifies the consumer preferences for one or more products andassociated product features and which is organized or stored in astructured format, such as a database or table. In this example, theoriginal item description 16 is now expanded and represented by aPartGroup identifier 23 and three attribute items 24, 25, 26. In similarfashion, the original item description 18 for a desktop computer isexpanded and represented by a PartGroup identifier 27 and sevenattribute items 28-34 (in this example) that are stored as an entry inthe second data set 22. These additional attribute items 28-34 specifythe processor speed 28, processor class 29, operating system type 30,hard drive size 31, optical drive type 32, software package type 33, andmonitor type 34 for the desktop item.

While the additional product detail information contained in the seconddata set has many potentially useful and interesting applications, itcan be used in transaction database applications to provide moremeaningful frequent pattern analysis. As will be appreciated by those ofordinary skill in the art, frequent patterns or itemsets may beconstructed using data mining techniques to find interesting patternsfrom databases, such as association rules, correlations, sequences,episodes, classifiers, clusters and the like. The task of discoveringand storing all frequent patterns from a database of items is quitechallenging, given that the search space is exponential in the number ofitems occurring in the database. For example, FPTree and FPGrowthtechniques may be used to generate association rules using a compactin-memory representation of the transaction database, such as describedin U.S. patent application Ser. No. 10/870,360, entitled“Attribute-based Association Rule Mining,” filed Jun. 17, 2004, andassigned to Trilogy Development Group, which is hereby incorporated byreference in its entirety. However, it is contemplated that other rulegeneration algorithms, including but not limited to an Apriori algorithmand its many variants, may also be used to generate rules in keepingwith the present invention.

For example, association rules may be generated from the expanded seconddata set of transaction information that is included as part of atransaction database 40, as indicated with entries 40 a-g in FIG. 1 . Animportant consideration with data mining applications is therepresentation of the transaction database 40. Conceptually, such adatabase can be represented by a binary two-dimensional matrix in whichevery row (e.g., 40 a) represents an individual transaction (with atransaction identifier (e.g., TID 100)) and the columns represent theitems in the transaction (e.g., f, a, c, d, g, 1, m, p). Such a matrixcan be implemented in several ways. The most commonly used layout is thehorizontal data layout. That is, each transaction has a transactionidentifier and a list of items occurring in that transaction. Anothercommonly used layout is the vertical data layout, in which the databaseconsists of a set of items, each followed by its cover.

In the example of FIG. 1 , the server 11 begins the process ofgenerating association rules 43 by retrieving the item descriptors fromthe transaction database 40 and a minimum support count 46, as indicatedwith arrow 41. As indicated with arrow 42, the server 11 identifies allitems in the database 40 with a frequency meeting or exceeding theminimum support count requirement (e.g., the minimum support count is3), and uses a rule generator 4 (depending on the rule generationalgorithm used) to generate a plurality of association rules 43 a-f.Each association rule has a support and confidence metric that iscalculated by the server 11. For example, the support metric 44 isdetermined by the number of times the rule is supported in thetransaction database 40, and the confidence metric 45 is determined bythe percentage of times the antecedent of the rule leads to theconsequent.

Simultaneously with or subsequent to the generation of the associationrules 43, the server 11 may also be configured to filter the generatedrules 43 to identify redundant rules that are candidates for removal ormodification, such as by removing rules providing only trivialassociations, removing rules that have only a part group as aconsequent, modifying rules with redundant antecedent items to remove aredundant antecedent part group, or using a relaxed rule subsumptiontechnique that may be flexibly calculated based on the confidence ratingof the subsuming or subsumed rule. In particular, it may be advantageousto retain a more general association rule and to remove an associationrule that is subsumed by the general association rule, even where thesubsumed rule has a higher confidence than the general rule. Forexample, general rules may act as defaults that are applied if no morespecific rule applies. However, if general rules are retained that havea confidence that is too low in relation to the subsumed rule, this cangreatly increase the size of the generated rule set. Various embodimentsof the present invention address this by flexibly calculating aspecialization lift threshold for managing the size of the generatedrule set when the confidence threshold is reduced.

By providing a scaled specialization lift threshold, a general rule isallowed to subsume a more specific rule (which may then be discarded) ifthe more specific rule does not provide sufficient “lift,” where lift isa measure of increase in confidence of the subsumed rule over thesubsuming rule. In accordance with selected embodiments of the presentinvention, the use of a scaled specialization lift threshold allows alarger lift threshold requirement to be applied to low confidence rulesin order to avoid subsumption, and allows smaller lift thresholdrequirements to be applied to higher confidence rules to avoidsubsumption. For example, a general rule with 30% confidence mightsubsume any more specific rules whose confidence is less than 50%, butnot those more specific rules whose confidence is above 50%. But forhigher confidence rules, the required lift may be adjusted to require asmaller lift threshold to avoid subsumption. For example, a general rulewith 85% confidence would only subsume any more specific rule up to 88%confidence.

In accordance with the present invention, lift may be scaled in avariety of ways to provide an adjustable lift threshold that isdetermined as a function of the confidence of the subsuming rule and/oreven as a function of the confidence of the subsumed rule. For example,the lift may be scaled in a linear fashion by first determining acomplement of the confidence of the subsuming rule, such as besubtracting the confidence percentage of the subsuming rule from 100percent. A lift parameter value between 0 and 1 may then be applied tothe complement value to calculate a linear scaled specialization liftthreshold. With such a scaled threshold, only subsumed rules exceedingthe threshold would be retained, but subsumed rules that fall below thethreshold would be discarded from the generated rules. In an alternativeembodiment, the required lift may be scaled in a non-linear fashion,such as by subtracting the confidence of the subsuming rule from 100%,squaring the difference, and making this the lift required in confidenceincrease for a more specific rule to avoid subsumption. For example, asubsuming rule with confidence of 50% would require that a more specificrule must improve confidence 25% (0.5 squared) to avoid subsumption. Ofcourse, other flexible scaling techniques may be implemented to adjustthe lift threshold as a function of the confidence of the subsuming orsubsumed rule.

Turning now to FIG. 2 , exemplary flow methodologies are illustrated forremoving reduridant rules from a set of generated association rulesusing an adjustable rule subsumption technique which allows aspecialization lift threshold to be adjusted based on the confidence ofthe subsuming rule. Though selected examples of how to calculate thespecialization lift threshold are illustrated in FIG. 2 , it will beappreciated by those of ordinary skill in the art that any linear ornon-linear calculation algorithm may be used to control the magnitude ofthe specialization lift threshold so that the threshold decreases as theconfidence of the subsuming rule increases. These steps may be performedfor each rule in the set of generated association rules to identifysubsumed rules that may be removed from the generated set. In addition,it will be appreciated that the methodology of the present invention maybe thought of as performing the identified sequence of steps in theorder depicted in FIG. 2 , though the steps may also be performed inparallel, in a different order, or as independent operations thatseparately calculate the specialization lift threshold and apply thethreshold to the subject rule(s).

The description of the method can begin at step 200, where a first rule(e.g., rule R1) is determined to have subsumed a second rule (e.g., ruleR2). While the mechanics of the subsumption determination can beaccomplished by a variety of ways, there are essentially tworequirements for a rule subsumption determination. First, theconsequents of the first rule (e.g., subsuming rule R1) are determinedto be a superset of the consequents of the second rule (e.g., subsumedrule R2). The second requirement is that the antecedents of the firstrule (e.g., subsuming rule R1) are satisfied in any context in which theantecedents of the second rule (e.g., subsumed rule R2) are satisfied.For example, rule R1 subsumes rule R2 when anything concluded by R2 isalso concluded by RI, and when the antecedents of R1 are more generalthat the antecedents of R2.

Once it is determined that a first rule subsumes a second rule, theprocess of calculating a specialization lift threshold begins. Asdescribed herein, any calculation algorithm that provides for anadjustable threshold may be used in connection with the presentinvention, including but not limited to linear scaling algorithms andnon-linear scaling algorithms. When considering a specific value for aspecialization lift, a value that decreases with increasing confidenceof the subsuming rule provides a means for further managing thespecialization lift heuristic. When the confidence of both the subsumingrule and subsumed rule are low, a larger value for specialization liftallows more low confidence rules to be filtered. As the confidence ofthe subsuming rule increases, fewer rules should be filtered.

In a selected embodiment, the specialization lift threshold (SL) iscalculated at step 202 as a simple linear scaled function of theconfidence of the subsuming rule RI by applying a scaling factor between0 and 1 (e.g., 0.2) to the difference between the confidence of thesubsuming rule and 100% (e.g., SL=(100%−R1confidence)*0.2). Of course,other scaling factors can be used at step 202. In an alternativeembodiment, a higher order relationship can force the threshold to bevery small for high confidence rules, and still large for low confidencerules. For example, the adjustable specialization lift threshold (SL)may be calculated at step 206 as a non-linear scaled function of theconfidence of the subsuming rule R1 by applying a scaling factor between0 and 1 (e.g., 0.2) to the square of the difference between theconfidence of the subsuming rule and 100% (e.g.,SL=(100%−R1confidence)*(100%−R1confidence)*0.2)). Again, other scalingfactors can be used to calculate the non-linear scaled specializationlift.

Once the adjustable specialization lift threshold (SL) is calculated,this threshold is used to determine if the confidence of the subsumedrule provides sufficient lift. If the subsumed rule provides sufficientlift, it is retained, otherwise it is discarded. As will be appreciated,the sufficiency of lift may be evaluated with reference to theadjustable specialization lift threshold being met or exceeded, or mayalternatively be evaluated with reference to the adjustablespecialization lift threshold being exceeded, as the case may be. In theillustrative embodiment, if the confidence of the subsumed rule (e.g.,R2) exceeds the confidence of the subsuming rule (e.g., R1) by at leastthe threshold amount SL (affirmative outcome to decision 204), thesubsumed rule (e.g., R2) is retained. However, if the confidence of thesubsumed rule (e.g., R2) does not exceed the confidence of the subsumingrule (e.g., R1) by at least the threshold amount SL (negative outcome todecision 204), the subsumed rule (e.g., R2) may be discarded.

Once it has been determined that the subsumed rule does not providesufficient lift to be retained (negative outcome to decision 204), itmay be determined if the first and second rules are the same rule bydetermining at step 210 if the second rule (e.g., R2) subsumes the firstrule (e.g., RI). Alternatively, the determination of′Yhether the secondrule subsumes the first rule may also be made earlier in the process(e.g., immediately after step 200). If second rule subsumes the firstrule (affirmative outcome to decision 210), this means that the tworules are the same, in which case the first rule (e.g., RI) may beremoved (at step 212). If the second rule does not subsume the firstrule (negative outcome to decision 210), then the second rule (e.g., R2)may be discarded from the generated set (step 214).

There are other rule set simplification techniques which may beimplemented in accordance with the present invention. In accordance witha selected embodiment, the association rules that are generated may beprocessed to identify and remove any rules providing only trivialassociations between the antecedent and consequent. For example, anassociation rule that has an antecedent in the same part group as theconsequent (e.g., LMonitor, Operating System] LMonitor.Type.3]) providesonly a trivial association, and may be removed from the mined rules.

Rule simplification may also be improved by modifying the existing minedrules to remove redundant antecedent information from the rules. Forexample, the association rules that are generated may be processed toidentify rules having a part group item that is redundant to anotheritem in the antecedent. An example of such a rule would beLOptical_Drive.Type.100, _Operating_System,_Optical_Drive]LMonitor.Type.4]. When such a rule is identified, it may be modified toremove the antecedent part group (_Optical_Drive) from the antecedent.

Rule simplification may also be improved by removing any rule thatincludes only a part group item as the consequent. The elimination ofsuch rules removes or reduced some inference direction ambiguities.

As will be appreciated by those of ordinary skill in the art, theremoval or simplification of redundant rules from a generated set ofassociation rules is only part of the process of generating associationrules. In particular, association rule mining algorithms typicallyrequire two steps: identifying all frequent patterns (also referred toas frequent sets) that satisfy a minimum support requirement, andgenerating all association rules that satisfy a minimum confidencerequirement using the identified frequent patterns. The secondstep—generating association rules—may be accomplished by generating thepower set of the frequent set (the set of all possible subsets) and thencalculating, for each rule derivable from the members of the power set,the support, confidence, lift or other indicia of meaningfulness todetermine if the rule meets the required thresholds. Once the generatedrules meeting the required thresholds are identified, redundant rulesmay be removed by, for example, removing rules providing only trivialassociations, modifying rules to remove redundant antecedent itemsand/or filtering subsumed rules from the generated rule set that do notprovide sufficient lift to meet the specialization lift thresholdrequirements.

FIG. 3 depicts an exemplary system for efficiently generatingattribute-based association rules mined from frequent patternsidentified in a transaction database. In FIG. 3 , the system 300comprises a data processing engine or unit 334 coupled to a storagedatabase 301 that stores a transaction database 302. The system 300 alsoincludes an input device 320 where at least one condition of theassociation rules to be mined is inputted by a user. For example, theinput device 320 is used to input the conditions (i.e., support,confidence, lift, etc.) for the association rule to be mined. Anattribute mapper 322 is also included for mapping a first data set to asecond, highly granular data set as described herein. In addition, afrequent pattern generator 324 is included for identifying frequentpatterns occurring in the transaction database 302. For example, thefrequent pattern generator 324 may use FPGrowth techniques to identifyfrequent patterns in the transaction database 302 meeting the minimumsupport count inputted by the user. At a general level, a rule generator326 is included for generating association rules from the frequentpattern information, and an output device 336 is also provided foroutputting the mined association rules. The storage database 301 may beconnected to the attribute mapper 322, frequent pattern generator 324and/or rule generator 326. Alternatively, transaction data from thestorage database 301 may be transformed by the attribute mapper 322,passed directly to the frequent pattern generator 324 for processing toidentify frequent patterns, and then passed to the rule generator 326for rule generation.

The attribute mapper 322 is provided for transforming generic itemdescriptors in the transaction database to provide more detailed itemdescription information concerning various product attributes and/orqualities for the item. For example, part number information may bemapped into more granular product or attribute information identifyingspecific features of the product, where the specific product orattribute information may be presented as native or numeric values. Inaddition, the mapping function may transform the product information toinclude more general information for the product, such as a PartGroup orother generalized identifier for the product. Each of the transformeddescriptors may be treated as separate items for use with the datamining techniques described herein to provide improved patterncorrelation based on the more specific attribute information containedin the transaction data.

At the frequent pattern generator 324, all of the frequent patterns fromthe transaction database 302 are compiled, and the support of eachfrequent pattern may be obtained. As will be appreciated, the use ofattribute-based representations as items in a database results in acombinatorial explosion in the quantity of frequent pattern informationthat is output by the frequent pattern generator 324. For example, byexpanding generic items into multiple attribute/value items, thetransaction size of the frequent patterns may increase by four to fivetimes. Using the example transaction database 302 depicted in FIG. 3 ,the-conventional approach for identifying products might only have asingle item for transaction TID 100, but by expanding the items toinclude attribute values, the transaction TID 100 includes eightitems—f, a, c, d, g, l, m and p.

At the rule generator 326, a preliminary rule set 304 of associationrules (e.g., R1-R6) is derived by using the frequent pattern informationprovided by the frequent pattern generator 324. A broad variety ofefficient algorithms for mining association rules have been developed inrecent years, including algorithms based on the level-wise Aprioriframework, TreeProjection and FPGrowth algorithms. While there aretechniques for reducing the processing resources required by the rulegeneration algorithms (such as described in the incorporated U.S. patentapplication Ser. No. 10/870,360, entitled “Attribute-based AssociationRule Mining”), the use of attribute-based items for the transactiondatabase 302 can still result in the generation of a large rule sets 304that include redundant rules and/or associations. As the size of thegenerated set of association rules increases, the time required to applythe generated association rules to obtain purchase recommendations alsoincreases. As a result, the available data mining techniques still inmany cases have high processing times leading to increased VO and CPUcosts.

Various embodiments of the present invention may be applied to removeand/or modify redundant rules, thereby reducing the size or complexityof the preliminary rule set 304 to form a final rule set 306. Inaccordance with a selected embodiment, the rule generator 326 includes aredundant association detector 328 which may be used to identifygenerated rules containing redundant associations that may be modifiedand/or removed from the preliminary rule set 304. For example, if theredundancy detector 328 determines that an association rule (e.g., R2)provides only a trivial association between its antecedent (e.g., [f, c,a]) and its consequent (e.g., [p]), then the association rule would notbe included in the final rule set 306. An example of such a trivialassociation would occur when the consequent item (e.g., [p]) belongs toa part group specified by an antecedent item (e.g., [f]).

In accordance with an alternative embodiment, the redundant associationdetector 328 may also identify a rule in the preliminary rule set 304that includes an antecedent item that is redundant of other antecedentitems in the rule. Such rules with redundant antecedents may be modifiedto remove the redundancy from the antecedent. For example, if theredundancy detector 328 determines that an association rule (e.g., R2)has an antecedent part group item (e.g., item [f]) that is redundant ofanother antecedent item (e.g., item [a]) in the rule, then the rulegenerator 326 would modify the association rule to remove the redundantpart group item (e.g., item [f]). Though not depicted in FIG. 3 , thisapproach would result in the final rule set 306 including a rule R2defined as [c, a] [p].

In accordance with yet another embodiment, where the preliminary ruleset 304 includes a first rule that subsumes a second rule, the presentinvention enables the second rule to be removed from the preliminaryrule set 304 if the confidence of the second rule does not providesufficient lift over the confidence of the first rule. The requirementof “sufficient lift” may be determined with reference to an adjustablespecialization lift threshold value that is calculated by aspecialization lift calculator 330. Any subsumed rule not meeting thecalculated specialization lift threshold requirement may be removed fromthe preliminary rule set 304 by the exclusion module 332 so that it isnot included in the final rule set 306. For example, suppose R3 subsumesR2, and R3 has 60% confidence and R2 has 66% confidence. Setting thespecialization lift threshold greater than six would result in R2 beingremoved from the generated rule set. The increased confidence of R2 isdeemed negligible compared to the expense of managing the additionalrule. Note that, to remove R2, it should be the case that R2 does notsubsume R3, meaning that R2 and R3 are the same rule, in which case R3should be removed since it is truly redundant.

As an additional or alternative approach, the specific value for aspecialization lift threshold may be calculated at the calculator 330 asa function of the confidence of the subsuming rule so that the valuedecreases with increasing confidence of the subsuming rule. With thisapproach, when the confidence of both the subsuming rule and subsumedrule are low, a larger value for specialization lift allows more lowconfidence rules to be filtered. As the confidence of the subsuming ruleincreases, fewer rules should be filtered. A simple linear relationshipcan be used, or a higher order relationship can be used to force thethreshold to be very small for high confidence rules, and still largefor low confidence rules.

With reference to the preliminary rule set 304 depicted in FIG. 3 , anillustrative linear calculation of the specialization lift threshold(SL=(100%−(Subsuming Rule Confidence))×0.175) may be applied to therules R1-R6 to identify subsumed rules that may be removed from the ruleset. With this equation, the required lift relative to a subsuming rulewith 60% confidence would be 7%, and the required lift relative to asubsuming rule with 50% confidence would be 8.75%. For example, once itis determined that R3 subsumes R2, the relative confidences are comparedusing the calculated specialization lift threshold equation, SL(100%−R3confidence)×0.175(100%−60%)×0.175=40%×0.175=7%. Since thesubsumed rule R2 (with confidence 66%) does not provide the requiredlift, rule R2 may be removed from the final rule set 306. On the otherhand, once it is determined that R4 subsumes R3, the specialization liftthreshold is calculated for the subsuming rule R4(SL=(100%−R4confidence)×0.175(100%−50%)×0.175 50%×0.175=8.75%. Since thesubsumed rule R3 (with confidence 60%) provides the required lift oversubsuming rule R4, rule R3 is retained in the final rule set 306.

As will be appreciated, other equations may be used to calculate thespecialization lift threshold. For example, the threshold may be reducedfor high confidence rules with the calculation equation,SL=(100%−(Subsuming Rule Confidence))×(100%−(Subsuming RuleConfidence))×0.2). Using this approach, the required lift relative to asubsuming rule with confidence 20% would be 12.8%; the required liftrelative to a subsuming rule with 50% confidence would be 5%; and therequired lift relative to a subsuming rule with 80% confidence would be0.8%.

In an exemplary embodiment, the system and method for efficientlygenerating association rules may be implemented with a data processingsystem that processes transaction database information to provide afrequent set with attribute-based items identifying the purchasedproduct, and to more efficiently generate association rules from thegenerated frequent set. For example, data processing may be performed oncomputer system 10 which may be found in many forms including, forexample, mainframes, minicomputers, workstations, servers, personalcomputers, internet terminals, notebooks, wireless or mobile computingdevices (including personal digital assistants), embedded systems andother information handling systems, which are designed to providecomputing power to one or more users, either locally or remotely. Acomputer system 10 includes one or more microprocessor or centralprocessing units (CPU) 12, mass storage memory 14 and local RAM memory15. The processor 12, in one embodiment, is a 32-bit or 64-bitmicroprocessor manufactured by Motorola, such as the 680×0 processor ormicroprocessor manufactured by Intel, such as the 80×86, or Pentiumprocessor, or IBM. However, any other suitable single or multiplemicroprocessors or microcomputers may be utilized. Computer programs anddata are generally stored as instructions and data in mass storage 14until loaded into main memory 15 for execution. Main memory 15 may becomprised of dynamic random access memory (DRAM). As will be appreciatedby those skilled in the art, the CPU 12 may be connected directly (orthrough an interface or bus) to a variety of peripheral and systemcomponents, such as a hard disk drive, cache memory, traditional 1/0devices (such as display monitors, mouse-type input devices, floppy diskdrives, speaker systems, keyboards, hard drive, CD-ROM drive, modems,printers), network interfaces, terminal devices, televisions, sounddevices, voice recognition devices, electronic pen devices, and massstorage devices such as tape drives, hard disks, compact disk (“CD”)drives, digital versatile disk (“DVD”) drives, and magneto-opticaldrives. The peripheral devices usually communicate with the processorover one or more buses and/or bridges. Thus, persons of ordinary skillin the art will recognize that the foregoing components and devices areused as examples for the sake of conceptual clarity and that variousconfiguration modifications are common.

Turning now to FIG. 4 , an exemplary flow methodology is illustrated forfinding frequent patterns using a frequent pattern tree and miningattribute-based association rules from the frequent pattern tree. Aswill be appreciated, the methodology illustrated in FIG. 4 shows thesteps for generating attribute-based items, for using an FPTree toidentify a frequent set using FPGrowth techniques, for generatingassociation rules from the attribute-based items in the frequent set andfor filtering our subsumed rules that do not provide sufficient lift.These steps may be performed for each entry in a transaction database toexpedite the generation of association rules having improved patterncorrelation. In addition, it will be appreciated that the methodology ofthe present invention may be thought of as performing the identifiedsequence of steps in the order depicted in FIG. 4 , though the steps mayalso be performed in parallel, in a different order, or as independentoperations that separately identify and store a frequent set andgenerate association rules therefrom.

The description of the method can begin at step 401, where an itemcorresponding to an ordered part number is retrieved from a first dataset in a transaction database. Each retrieved part number is transformedor mapped into a PartGroup item and/or attribute value pairs at step403, thereby creating a second data set. For a given database, eachretrieved part number is transformed until the mapping is complete(affirmative outcome to decision 405).

Once the second data set is complete, the process of constructing anFPTree begins at step 407 by making a first pass through the second(mapped) data set to obtain a frequency count for each item in thesecond data set. For items identified at step 409 that are above theminimum support threshold requirement, a header table is built at step411 that lists the items and frequency count in descending order. Next,a second pass through the second data set is made at step 413 to buildthe FPTree by sorting the frequent items in a transaction based onheader table order, merging the transactions with identical item setsand having transactions with the same prefix share the same path in thetree. At the conclusion of step 413, the FPTree data structure iscompleted and stored in memory, and may be used with the FPGrowthtechnique to generate the frequent patterns or frequent set.

In particular, the FPGrowth technique selects an item at step 415, findsall prefix patterns in the FPTree for that item at step 417 and uses theprefix paths to build a conditional FPTree at step 419. With theconditional FPTree, frequent patterns are mined for each selected item(step 423), and the process repeats until all items have been processed(affirmative outcome to decision 423). At this point (which correspondsto the output of frequent pattern generator 324 in FIG. 3 ), the processof generating rules from the frequent set may begin. However, by virtueof transforming the first data set to represent items in terms of theirfeatures (such as part groups and attribute-value pairs) to providesufficient detail so that association rule mining can be used forcomplex products, the representation at this level results in acombinatorial explosion in the amount of data that the rule generationalgorithms must examine. To manage that combinatorial explosion,selected attribute and part group items may be consolidated during thesubset generation process. As illustrated at step 427, the PartGroupitems may be excluded from the frequent set, at least where the frequentset already contains attribute values corresponding to the PartGroup. Inaddition or in the alternative, all item references for the same partgroup may be considered as a unit by inserting a proxy value in theirplace during the power set generation process (as indicated at step429). Once the attribute values are consolidated, the power set isgenerated based on the consolidated proxy values (step 431). After thesubsets are generated, the subsets are expanded by replacing the proxyvalues with the original attribute values, and then the threshold testsare applied to generate association rules that meet the thresholdrequirements (step 433).

The generated association rules may then be filtered to remove redundantrules by processing the generated rule set to identify any rule thatsubsumes another rule (step 435). For each identified subsumed rulepair, an adjustable specialization lift threshold is calculated (step437). For example, the threshold may be calculated as a function of theconfidence of the subsuming rule in the subsumed rule pair, and/or maybe calculated as a function of the confidence of the subsumed rule inthe subsumed rule pair. By applying the calculated adjustablespecialization lift threshold to each subsumed rule pair, subsumed rulesnot providing the confidence lift specified by the threshold may bediscarded from the generated set of association rules (step 439).

The above-discussed embodiments include software that performs certaintasks. The software discussed herein may include script, batch, or otherexecutable files. The software may be stored on a machine-readable orcomputer-readable storage medium, and is otherwise available to directthe operation of the computer system as described herein and claimedbelow. In one embodiment, the software uses a local or database memoryto implement the data transformation and data structures so as toimprove the generation of attribute-based rules. The local or databasememory used for storing firmware or hardware modules in accordance withan embodiment of the invention may also include a semiconductor-basedmemory, which may be permanently, removably or remotely coupled to amicroprocessor system. Other new and various types of computer-readablestorage media may be used to store the modules discussed herein.Additionally, those skilled in the art will recognize that theseparation of functionality into modules is for illustrative purposes.Alternative embodiments may merge the functionality of multiple softwaremodules into a single module or may impose an alternate decomposition offunctionality of modules. For example, a software module for callingsub-modules may be decomposed so that each sub-module performs itsfunction and passes control directly to another sub-module.

The computer-based data processing system described above is forpurposes of example only, and may be implemented in any type of computersystem or programming or processing environment, or in a computerprogram, alone or in conjunction with hardware. Various embodiments ofthe present may also be implemented in software stored on acomputer-readable medium and executed as a computer program on a generalpurpose or special purpose computer. For clarity, only those aspects ofthe system germane to the invention are described, and product detailswell known in the art are omitted. For the same reason, the computerhardware is not described in further detail. It should thus beunderstood that the invention is not limited to any specific computerlanguage, program, or computer. It is further contemplated that thepresent invention may be run on a stand-alone computer system, or may berun from a server computer system that can be accessed by a plurality ofclient computer systems interconnected over an intranet network, or thatis accessible to clients over the Internet. In addition, manyembodiments of the present invention have application to a wide range ofindustries including the following: computer hardware and softwaremanufacturing and sales, professional services, financial services,automotive sales and manufacturing, telecommunications sales andmanufacturing, medical and pharmaceutical sales and manufacturing, andconstruction industries.

Although the present invention has been described in detail, it is notintended to limit the invention to the particular form set forth, but onthe contrary, is intended to cover such alternatives, modifications andequivalents as may be included within the spirit and scope of theinvention as defined by the appended claims so that those skilled in theart should understand that they can make various changes, substitutionsand alterations without departing from the spirit and scope of theinvention in its broadest form.

1-24. (canceled)
 25. A computer-based method for generating associationrules to extract useful information for use in market analysisapplications, comprising: identifying a subsuming rule in a first set ofassociation rules having a first confidence metric value; andtransforming the first set of association rules into a second set ofassociation rules at a computer system by removing one or moreassociation rules from the first set of association rules which aresubsumed by the subsuming rule and which have computed confidence metricvalues that do not exceed the first confidence metric value by more thana scaled lift threshold value.
 26. The method of claim 25, furthercomprising obtaining the first set of association rules by: transforminga first data set of product identifiers representing a plurality ofphysical or tangible objects into a second data set by mapping eachproduct identifier into a plurality of product attribute identifiers forstorage in the second data set; and generating the first set ofassociation rules from the second data set, where each association rulein the first set of association rules comprises data representing anassociation relationship between a plurality of physical or tangibleproducts.
 27. The method of claim 25, further comprising storing thesecond set of association rules in a memory storage database.
 28. Themethod of claim 25, further calculating the scaled lift threshold valuethat is scaled as a function of the first confidence metric value. 29.The method of claim 28, where calculating the scaled lift thresholdvalue comprises computing a scaled lift threshold value as a non-linearfunction of the first confidence metric value.
 30. The method of claim28, where calculating the scaled lift threshold value comprisescomputing a scaled lift threshold value by determining a complement ofthe first confidence metric value and multiplying the complement by ascaling factor between 0 and
 1. 31. The method of claim 28, wherecomprising calculating the scaled lift threshold value that is scaled asa function of the first confidence metric value by determining acomplement of the first confidence metric value, squaring the complementto obtain a squared value and multiplying the squared value by a scalingfactor
 32. The method of claim 28, where calculating the scaled liftthreshold value comprises computing a scaled lift threshold value thatdecreases as the first confidence metric value increases.
 33. The methodof claim 28, where calculating the scaled lift threshold value comprisescomputing a scaled lift threshold value that is larger for a relativelysmall first confidence metric value and is smaller for a relativelylarge first confidence metric value.
 34. The method of claim 25, wheretransforming the first set of association rules comprises retaining oneor more association rules from the first set of association rules in thesecond set of association rules which are subsumed by the subsuming ruleand which have computed confidence metric values that exceed the firstconfidence metric value by more than the scaled lift threshold value.35. An article of manufacture having at least one recordable mediumhaving stored thereon executable instructions and data which, whenexecuted by at least one processing device, generate association rulesto extract useful information from a database for use in market analysisapplications by causing the at least one processing device to: identifya subsuming rule in a first set of association rules having a firstconfidence metric value; and transform the first set of associationrules into a second set of association rules by removing one or moreassociation rules from the first set of association rules which aresubsumed by the subsuming rule and which have computed confidence metricvalues that do not exceed the first confidence metric value by more thana scaled lift threshold value.
 36. The article of manufacture of claim35, wherein the executable instructions and data cause the at least oneprocessing device to obtain the first set of association rules by:transforming a first data set of product identifiers representing aplurality of physical or tangible objects into a second data set bymapping each product identifier into a plurality of product attributeidentifiers for storage in the second data set; and generating the firstset of association rules from the second data set, where eachassociation rule in the first set of association rules comprises datarepresenting an association relationship between a plurality of physicalor tangible products.
 37. The article of manufacture of claim 35,wherein the executable instructions and data cause the at least oneprocessing device to calculate the scaled lift threshold value that isscaled as a function of the first confidence metric value.
 38. Thearticle of manufacture of claim 37, wherein the scaled lift thresholdvalue is calculated as a non-linear function of the first confidencemetric value.
 39. The article of manufacture of claim 37, wherein thescaled lift threshold value is calculated by determining a complement ofthe first confidence metric value, squaring the complement to obtain asquared value and multiplying the squared value by a scaling factorbetween 0 and
 1. 40. The article of manufacture of claim 37, wherein thescaled lift threshold value decreases as the first confidence metricvalue increases.
 41. A system for mining attribute-based associationrules to extract useful information from a database for use in marketanalysis applications, comprising: a database for storing a first set ofassociation rules; and a processing engine for transforming the firstset of association rules into a second set of association rules byidentifying a subsuming rule in the first set of association ruleshaving a first confidence metric value and removing one or moreassociation rules from the first set of association rules which aresubsumed by a subsuming rule in the first set of association rules andwhich have computed confidence metric values that do not exceed a firstconfidence metric value for the subsuming rule by more than a scaledlift threshold value.
 42. The system of claim 41, where the processingengine calculates the scaled lift threshold value that is scaled as afunction of the first confidence metric value.
 43. The system of claim42, where the processing engine calculates the scaled lift thresholdvalue by determining a complement of the first confidence metric value,squaring the complement to obtain a squared value and multiplying thesquared value by a scaling factor.
 44. The system of claim 42, where theprocessing engine calculates the scaled lift threshold value to decreaseas the first confidence metric value increases.
 45. The system of claim41, where the processing engine transforms the first set of associationrules by retaining one or more association rules that are subsumed bythe subsuming rule and which have confidence metric values which exceedthe first confidence metric value by at least the scaled lift thresholdvalue.
 46. The system of claim 41, where the processing enginetransforms the first set of association rules by discarding a thirdassociation rule if the third association rule provides a trivialassociation.
 47. The system of claim 41, wherein the scaling factor is ascaling factor between 0 and 1.